If you are using the package for the first time, you will first have to install it.
# install.packages("survival")
# install.packages("memisc")
If you have already downloaded this package in the current version of R, you will only have to load the package.
library(survival)
## Warning: package 'survival' was built under R version 4.0.4
library(memisc)
## Warning: package 'memisc' was built under R version 4.0.3
## Loading required package: lattice
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 4.0.4
##
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
##
## contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
##
## as.array
Load a data set from a package.
You can use the double colon symbol (:), to return the pbc object from the package survival. We store this data set to an object with the name pbc.
pbc <- survival::pbc
What is the mean and standard deviation for the variable age
of the pbc data set?
mean(x = pbc$age)
## [1] 50.74155
mean(x = pbc$age, na.rm = TRUE)
## [1] 50.74155
sd(x = pbc$age)
## [1] 10.44721
What is the mean and variance for the variable chol
of the pbc data set?
mean(x = pbc$chol)
## [1] NA
mean(x = pbc$chol, na.rm = TRUE)
## [1] 369.5106
var(x = pbc$chol, na.rm = TRUE)
## [1] 53798.27
What is the median and interquartile range for the variable age
of the pbc data set?
median(x = pbc$age)
## [1] 51.00068
IQR(x = pbc$age)
## [1] 15.40862
What is the min and max of the variable age
of the pbc data set?
min(x = pbc$age)
## [1] 26.27789
max(x = pbc$age)
## [1] 78.43943
range(x = pbc$age)
## [1] 26.27789 78.43943
What are the 10th, 25th, 50th, 75th and 90th percentiles for serum bilirubin
of the pbc data set?
quantile(x = pbc$bili, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
## 10% 25% 50% 75% 90%
## 0.60 0.80 1.40 3.40 8.03
The functions colMeans()
and rowMeans()
allow us to calculate the mean for each column or column in a matrix or data.frame, e.g.:
colMeans(x = data.frame(bili = pbc$bili, chol = pbc$chol), na.rm = TRUE)
## bili chol
## 3.220813 369.510563
rowMeans(x = data.frame(bili = pbc$bili, chol = pbc$chol), na.rm = TRUE)
## [1] 137.75 151.55 88.70 122.90 141.20 124.40 161.50 140.15 282.60 106.30
## [11] 130.20 119.80 140.85 0.80 115.90 102.35 138.35 94.70 117.85 189.55
## [21] 126.30 137.20 206.20 229.05 149.35 566.60 98.30 119.60 185.35 131.80
## [31] 150.35 131.90 105.40 182.40 157.60 86.15 170.55 193.15 141.35 1.30
## [41] 6.80 2.10 181.05 151.15 0.60 243.85 158.25 130.45 0.80 129.05
## [51] 138.40 310.00 2.60 144.65 208.90 249.55 131.15 121.35 164.90 302.45
## [61] 108.30 151.65 477.25 187.55 128.60 214.20 233.55 87.35 336.00 0.60
## [71] 129.60 160.25 66.35 283.20 345.55 203.10 125.30 221.15 157.90 127.10
## [81] 231.20 238.25 125.65 131.70 132.05 802.50 173.05 148.30 205.00 330.80
## [91] 165.00 103.70 177.15 102.10 17.40 1.00 211.00 120.00 230.90 90.15
## [101] 200.45 124.45 95.25 152.05 232.55 2.10 106.30 63.70 60.25 243.95
## [111] 266.75 134.50 190.35 131.10 151.85 230.50 478.25 196.75 318.30 164.25
## [121] 76.15 149.30 5.10 125.80 158.65 135.10 134.25 16.20 210.45 896.20
## [131] 122.40 224.95 166.25 289.35 131.70 131.90 200.05 216.65 164.55 145.55
## [141] 173.45 182.50 167.45 292.00 154.85 1.20 144.60 511.10 130.00 1.00
## [151] 230.45 294.15 108.75 85.20 110.30 191.75 143.30 226.70 159.75 108.80
## [161] 252.15 131.60 116.65 8.50 100.00 742.85 188.45 128.70 204.65 195.60
## [171] 0.50 103.15 119.50 0.50 141.90 3.20 129.45 0.60 198.90 241.35
## [181] 124.70 0.60 100.75 342.50 128.40 113.50 411.00 93.85 180.65 2.30
## [191] 558.25 154.45 471.40 147.25 175.35 113.70 133.30 143.35 197.05 120.35
## [201] 117.80 111.75 74.75 127.85 192.25 106.80 0.60 199.95 126.35 173.45
## [211] 1.30 116.60 200.25 202.45 640.95 0.50 309.70 0.50 108.30 214.90
## [221] 180.45 188.25 231.05 155.00 137.35 111.75 159.15 107.85 97.75 152.65
## [231] 260.70 133.70 257.45 289.45 674.50 127.25 221.80 140.30 150.40 116.20
## [241] 160.20 177.95 238.00 176.95 136.80 194.55 859.05 162.40 121.65 149.80
## [251] 113.75 123.55 125.05 115.05 96.85 168.55 140.25 207.55 140.05 118.80
## [261] 189.10 162.40 216.55 179.70 175.75 159.25 114.30 175.20 188.80 224.50
## [271] 161.00 113.25 165.10 1.60 287.10 110.00 159.00 171.80 99.25 163.30
## [281] 96.45 152.65 206.55 146.15 126.90 156.00 189.70 159.35 210.00 147.70
## [291] 171.10 277.30 101.25 503.30 324.20 164.40 138.10 170.55 172.20 5.20
## [301] 197.00 167.85 186.50 109.75 214.45 119.80 136.90 123.20 130.20 217.85
## [311] 124.50 291.20 0.70 1.40 0.70 0.70 0.80 0.70 5.00 0.40
## [321] 1.30 1.10 0.60 0.60 1.80 1.50 1.20 1.00 0.70 3.50
## [331] 3.10 12.60 2.80 7.10 0.60 2.10 1.80 16.00 0.60 5.40
## [341] 9.00 0.90 11.10 8.90 0.50 0.60 3.40 0.90 1.40 2.10
## [351] 15.00 0.60 1.30 1.30 1.60 2.20 3.00 0.80 0.80 1.80
## [361] 5.50 18.00 0.60 2.70 0.90 1.30 1.10 13.80 4.40 16.00
## [371] 7.30 0.60 0.70 0.70 1.70 9.50 2.20 1.80 3.30 2.90
## [381] 1.70 14.00 0.80 1.30 0.70 1.70 13.60 0.90 0.70 3.00
## [391] 1.20 0.40 0.70 2.00 1.40 1.60 0.50 7.30 8.10 0.50
## [401] 4.20 0.80 2.50 4.60 1.00 4.50 1.10 1.90 0.70 1.50
## [411] 0.60 1.00 0.70 1.20 0.90 1.60 0.80 0.70
The functions colSums()
and rowSums()
allow us to calculate the sum for each column or column in a matrix or data.frame, e.g.:
colSums(x = data.frame(bili = pbc$bili, chol = pbc$chol), na.rm = TRUE)
## bili chol
## 1346.3 104941.0
rowSums(x = data.frame(bili = pbc$bili, chol = pbc$chol), na.rm = TRUE)
## [1] 275.5 303.1 177.4 245.8 282.4 248.8 323.0 280.3 565.2 212.6
## [11] 260.4 239.6 281.7 0.8 231.8 204.7 276.7 189.4 235.7 379.1
## [21] 252.6 274.4 412.4 458.1 298.7 1133.2 196.6 239.2 370.7 263.6
## [31] 300.7 263.8 210.8 364.8 315.2 172.3 341.1 386.3 282.7 1.3
## [41] 6.8 2.1 362.1 302.3 0.6 487.7 316.5 260.9 0.8 258.1
## [51] 276.8 620.0 2.6 289.3 417.8 499.1 262.3 242.7 329.8 604.9
## [61] 216.6 303.3 954.5 375.1 257.2 428.4 467.1 174.7 672.0 0.6
## [71] 259.2 320.5 132.7 566.4 691.1 406.2 250.6 442.3 315.8 254.2
## [81] 462.4 476.5 251.3 263.4 264.1 1605.0 346.1 296.6 410.0 661.6
## [91] 330.0 207.4 354.3 204.2 17.4 1.0 422.0 240.0 461.8 180.3
## [101] 400.9 248.9 190.5 304.1 465.1 2.1 212.6 127.4 120.5 487.9
## [111] 533.5 269.0 380.7 262.2 303.7 461.0 956.5 393.5 636.6 328.5
## [121] 152.3 298.6 5.1 251.6 317.3 270.2 268.5 16.2 420.9 1792.4
## [131] 244.8 449.9 332.5 578.7 263.4 263.8 400.1 433.3 329.1 291.1
## [141] 346.9 365.0 334.9 584.0 309.7 1.2 289.2 1022.2 260.0 1.0
## [151] 460.9 588.3 217.5 170.4 220.6 383.5 286.6 453.4 319.5 217.6
## [161] 504.3 263.2 233.3 8.5 200.0 1485.7 376.9 257.4 409.3 391.2
## [171] 0.5 206.3 239.0 0.5 283.8 3.2 258.9 0.6 397.8 482.7
## [181] 249.4 0.6 201.5 685.0 256.8 227.0 822.0 187.7 361.3 2.3
## [191] 1116.5 308.9 942.8 294.5 350.7 227.4 266.6 286.7 394.1 240.7
## [201] 235.6 223.5 149.5 255.7 384.5 213.6 0.6 399.9 252.7 346.9
## [211] 1.3 233.2 400.5 404.9 1281.9 0.5 619.4 0.5 216.6 429.8
## [221] 360.9 376.5 462.1 310.0 274.7 223.5 318.3 215.7 195.5 305.3
## [231] 521.4 267.4 514.9 578.9 1349.0 254.5 443.6 280.6 300.8 232.4
## [241] 320.4 355.9 476.0 353.9 273.6 389.1 1718.1 324.8 243.3 299.6
## [251] 227.5 247.1 250.1 230.1 193.7 337.1 280.5 415.1 280.1 237.6
## [261] 378.2 324.8 433.1 359.4 351.5 318.5 228.6 350.4 377.6 449.0
## [271] 322.0 226.5 330.2 1.6 574.2 220.0 318.0 343.6 198.5 326.6
## [281] 192.9 305.3 413.1 292.3 253.8 312.0 379.4 318.7 420.0 295.4
## [291] 342.2 554.6 202.5 1006.6 648.4 328.8 276.2 341.1 344.4 5.2
## [301] 394.0 335.7 373.0 219.5 428.9 239.6 273.8 246.4 260.4 435.7
## [311] 249.0 582.4 0.7 1.4 0.7 0.7 0.8 0.7 5.0 0.4
## [321] 1.3 1.1 0.6 0.6 1.8 1.5 1.2 1.0 0.7 3.5
## [331] 3.1 12.6 2.8 7.1 0.6 2.1 1.8 16.0 0.6 5.4
## [341] 9.0 0.9 11.1 8.9 0.5 0.6 3.4 0.9 1.4 2.1
## [351] 15.0 0.6 1.3 1.3 1.6 2.2 3.0 0.8 0.8 1.8
## [361] 5.5 18.0 0.6 2.7 0.9 1.3 1.1 13.8 4.4 16.0
## [371] 7.3 0.6 0.7 0.7 1.7 9.5 2.2 1.8 3.3 2.9
## [381] 1.7 14.0 0.8 1.3 0.7 1.7 13.6 0.9 0.7 3.0
## [391] 1.2 0.4 0.7 2.0 1.4 1.6 0.5 7.3 8.1 0.5
## [401] 4.2 0.8 2.5 4.6 1.0 4.5 1.1 1.9 0.7 1.5
## [411] 0.6 1.0 0.7 1.2 0.9 1.6 0.8 0.7
What is the correlation between serum bilirubin
and serum cholesterol
of the pbc data set?
cor(x = pbc$bili, pbc$chol, use = "complete.obs", method = "pearson")
## [1] 0.3971289
cor(x = pbc$bili, pbc$chol, use = "complete.obs", method = "spearman")
## [1] 0.4024538
What is the correlation matrix for the variables serum bilirubin
, serum cholesterol
and alkaline
of the pbc data set?
cor(x = data.frame(pbc$bili, pbc$chol, pbc$albumin),
use = "complete.obs")
## pbc.bili pbc.chol pbc.albumin
## pbc.bili 1.0000000 0.39712889 -0.31310846
## pbc.chol 0.3971289 1.00000000 -0.06973277
## pbc.albumin -0.3131085 -0.06973277 1.00000000
What is the variance-covariance matrix for the above variables?
var(x = data.frame(pbc$bili, pbc$chol, pbc$albumin),
use = "complete.obs")
## pbc.bili pbc.chol pbc.albumin
## pbc.bili 20.7116508 419.201667 -0.5752636
## pbc.chol 419.2016672 53798.271973 -6.5295880
## pbc.albumin -0.5752636 -6.529588 0.1629782
cov(x = data.frame(pbc$bili, pbc$chol, pbc$albumin),
use = "complete.obs")
## pbc.bili pbc.chol pbc.albumin
## pbc.bili 20.7116508 419.201667 -0.5752636
## pbc.chol 419.2016672 53798.271973 -6.5295880
## pbc.albumin -0.5752636 -6.529588 0.1629782
A (co)variance matrix can be converted to a (pearson) correlation matrix with the help of the function cov2cor()
:
cov2cor(V = var(x = data.frame(pbc$bili, pbc$chol, pbc$albumin),
use = "complete.obs"))
## pbc.bili pbc.chol pbc.albumin
## pbc.bili 1.0000000 0.39712889 -0.31310846
## pbc.chol 0.3971289 1.00000000 -0.06973277
## pbc.albumin -0.3131085 -0.06973277 1.00000000
What is the percentage of placebo
and treatment
patients in the pbc data set? (In order to use the percent()
function you will need to load the memisc
package)
percent(x = pbc$trt)
## 1 2 N
## 50.64103 49.35897 312.00000
What is the percentage of females
and males
in the pbc data set?
percent(x = pbc$sex)
## m f N
## 10.52632 89.47368 418.00000
What are the frequencies of each combination for the variables trt
and sex
in the pbc data set?
table(trt = pbc$trt, sex = pbc$sex)
## sex
## trt m f
## 1 21 137
## 2 15 139
To add summaries (e.g. the sum) for each column and/or row use the addmargins()
function:
tab <- table(trt = pbc$trt, sex = pbc$sex)
addmargins(A = tab)
## sex
## trt m f Sum
## 1 21 137 158
## 2 15 139 154
## Sum 36 276 312
We can also change the function (e.g use the mean):
addmargins(A = tab, FUN = mean)
## Margins computed over dimensions
## in the following order:
## 1: trt
## 2: sex
## sex
## trt m f mean
## 1 21 137 79
## 2 15 139 77
## mean 18 138 78
What are the percentages of each combination for the variables trt
and sex
in the pbc data set?
prop.table(x = tab)
## sex
## trt m f
## 1 0.06730769 0.43910256
## 2 0.04807692 0.44551282
For tables with more that 2 dimensions use:
ftable(x = table(trt = pbc$trt, sex = pbc$sex, ascites = pbc$ascites))
## ascites 0 1
## trt sex
## 1 m 20 1
## f 124 13
## 2 m 13 2
## f 131 8
ftable(x = data.frame(trt = pbc$trt, sex = pbc$sex, ascites = pbc$ascites))
## ascites 0 1
## trt sex
## 1 m 20 1
## f 124 13
## 2 m 13 2
## f 131 8
With the help of the arguments row.vars and col.vars we can determine which variables are given in the rows and which in the columns:
ftable(x = table(trt = pbc$trt, sex = pbc$sex, ascites = pbc$ascites),
row.vars = c(3, 2))
## trt 1 2
## ascites sex
## 0 m 20 13
## f 124 131
## 1 m 1 2
## f 13 8
Check if there are any missing values in the serum cholesterol
variable of the pbc data set:
is.na(pbc$chol)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [49] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
## [181] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
## [217] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [325] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [337] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [349] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [361] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [373] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [385] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [397] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [409] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Check if there are any complete cases in the serum cholesterol
variable of the pbc data set:
complete.cases(pbc$chol)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [37] TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
## [49] FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [73] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## [97] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [109] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [121] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [145] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [157] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [169] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE
## [181] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [193] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [205] TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
## [217] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [229] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [253] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [265] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [277] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [289] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [373] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Obtain the dimensions of a matrix or data frame. We can use the function dim()
:
dim(pbc)
## [1] 418 20
Outliers: e.g. let’s assume that patients with serum bilirun
values > 25 are outliers.
serum bilirun
outliers:pbc_out_bili <- pbc[pbc$bili > 25, ]
Calculate the mean and median of the serum bilirun
variable without the outliers:
pbc_no_out_bili <- pbc[pbc$bili <= 25, ]
mean(pbc_no_out_bili$bili)
## [1] 3.107692
median(pbc_no_out_bili$bili)
## [1] 1.35
Calculate the mean and median of the serum bilirubin
variable without the missing values in the serum cholesterol
variable:
pbc_no_mis_chol <- pbc[complete.cases(pbc$chol) == TRUE, ]
mean(pbc_no_mis_chol$bili)
## [1] 3.276056
median(pbc_no_mis_chol$bili)
## [1] 1.4